A multi-stage approach to clustering and imputation of gene expression profiles

نویسندگان

  • Dorothy S. V. Wong
  • Frederick K. Wong
  • Graham R. Wood
چکیده

MOTIVATION Microarray experiments have revolutionized the study of gene expression with their ability to generate large amounts of data. This article describes an alternative to existing approaches to clustering of gene expression profiles; the key idea is to cluster in stages using a hierarchy of distance measures. This method is motivated by the way in which the human mind sorts and so groups many items. The distance measures arise from the orthogonal breakup of Euclidean distance, giving us a set of independent measures of different attributes of the gene expression profile. Interpretation of these distances is closely related to the statistical design of the microarray experiment. This clustering method not only accommodates missing data but also leads to an associated imputation method. RESULTS The performance of the clustering and imputation methods was tested on a simulated dataset, a yeast cell cycle dataset and a central nervous system development dataset. Based on the Rand and adjusted Rand indices, the clustering method is more consistent with the biological classification of the data than commonly used clustering methods. The imputation method, at varying levels of missingness, outperforms most imputation methods, based on root mean squared error (RMSE). AVAILABILITY Code in R is available on request from the authors.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Missing data imputation in multivariable time series data

Multivariate time series data are found in a variety of fields such as bioinformatics, biology, genetics, astronomy, geography and finance. Many time series datasets contain missing data. Multivariate time series missing data imputation is a challenging topic and needs to be carefully considered before learning or predicting time series. Frequent researches have been done on the use of diffe...

متن کامل

بررسی اثرات تغییر بیان ریز آر ان ای های سلولی ناشی از ویروس پاپیلوم انسانی در سلول های سرطانی سنگفرشی سر و گردن در سطح پروفیل بیان ژنی

Background and aim: Human Papilloma Virus plays an important role in some of human malignancies and causes alterations in normal expression levels of cellular microRNAs. In this paper, we evaluated the effects of such changes on Head and Neck Squamous Cell Carcinoma tumor samples at gene expression profile level. Methods: in this descriptive-analytical study, gene expression profiles of 36 tum...

متن کامل

Investigation of Histone Lysine-Specific Demethylase 5D (KDM5D) Isoform Expression in Prostate Cancer Cell Lines: a System Approach

Background: It is now well-demonstrated that histone demethylases play an important role in developmental controls, cell-fate decisions, and a variety of diseases such as cancer. Lysine-specific demethylase 5D (KDM5D) is a male-specific histone demethylase that specifically demethylates di- and tri-methyl H3K4 at the start site of active genes. In this light, the aim of this study was to invest...

متن کامل

Multivariate Feature Extraction for Prediction of Future Gene Expression Profile

Introduction: The features of a cell can be extracted from its gene expression profile. If the gene expression profiles of future descendant cells are predicted, the features of the future cells are also predicted. The objective of this study was to design an artificial neural network to predict gene expression profiles of descendant cells that will be generated by division/differentiation of h...

متن کامل

به کارگیری روش‌های خوشه‌بندی در ریزآرایه DNA

Background: Microarray DNA technology has paved the way for investigators to expressed thousands of genes in a short time. Analysis of this big amount of raw data includes normalization, clustering and classification. The present study surveys the application of clustering technique in microarray DNA analysis. Materials and methods: We analyzed data of Van’t Veer et al study dealing with BRCA1...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 23 8  شماره 

صفحات  -

تاریخ انتشار 2007